An example problem demonstrating how NMFk can be applied to extract and clasify features and sensors observing these mixed features.
This type of analysis is related to blind source separation
Applying NMFk, we can automatically:
NMFk is a code within the SmartTensors framework.
If NMFk is not installed, first execute in the Julia REPL: import Pkg; Pkg.add("NMFk"); Pkg.add("Mads").
import Revise
import NMFk
import Mads
import Random
Random.seed!(2021)
MersenneTwister(2021)
Let us generate 4 random signals with legnth of 100 (this can be considered as 100 ):
s1 = (sin.(0.05:0.05:5) .+1) ./ 2
s2 = (sin.(0.3:0.3:30) .+ 1) ./ 2
s3 = (sin.(0.5:0.5:50) .+ 1) ./ 2
s4 = rand(100)
W = [s1 s2 s3 s4]
100×4 Matrix{Float64}:
0.52499 0.64776 0.739713 0.405796
0.549917 0.782321 0.920735 0.0657738
0.574719 0.891663 0.998747 0.398162
0.599335 0.96602 0.954649 0.163816
0.623702 0.998747 0.799236 0.783094
0.64776 0.986924 0.57056 0.134115
0.671449 0.931605 0.324608 0.883121
0.694709 0.837732 0.121599 0.386875
0.717483 0.71369 0.0112349 0.242105
0.739713 0.57056 0.0205379 0.131588
0.761344 0.421127 0.14723 0.085331
0.782321 0.27874 0.360292 0.330099
0.802593 0.156117 0.60756 0.654601
⋮
0.0171135 0.999997 0.747443 0.508222
0.0112349 0.978188 0.925452 0.199709
0.00657807 0.913664 0.999295 0.857753
0.0031545 0.812189 0.950894 0.130975
0.000972781 0.682826 0.792098 0.381099
3.83712e-5 0.537133 0.561787 0.89211
0.000353606 0.388122 0.316347 0.18814
0.0019177 0.249105 0.115873 0.695555
0.00472673 0.1325 0.00944578 0.462331
0.00877369 0.0487227 0.0231237 0.574861
0.0140485 0.00525646 0.153558 0.0919372
0.0205379 0.00598419 0.368813 0.710313
The singals look like this:
Mads.plotseries(W)
Now we can mix the signals in matrix W to produce a data matrix X representing data collected at 10 sensors (e.g., measurement devices or wells at different locations).
Each of the 10 sensors is observing some mixture of the 4 signals in W.
The way the 4 signals are mixed at the sensors is represented by the mixing matrix H.
Let us define the mixing matrix H as:
H = [1 5 0 0 1 1 2 1 0 2; 0 1 1 5 2 1 0 0 2 3; 3 0 0 1 0 1 0 5 4 3; 1 1 4 1 5 0 1 1 5 3]
4×10 Matrix{Int64}:
1 5 0 0 1 1 2 1 0 2
0 1 1 5 2 1 0 0 2 3
3 0 0 1 0 1 0 5 4 3
1 1 4 1 5 0 1 1 5 3
Each column of the H matrix defines how the 3 signals are represented in each sensors.
For example, the first sensor (column 1 above) detects only Signals 1 and 3; Signal 2 is missing because H[2,1] is equal to zero.
The second sensor (column 2 above) detects Signals 1, 2 and 4; Signal 3 is missing because H[3,2] is equal to zero.
The entries of H matrix also define the proportions at which the signals are mixed.
For example, the first sensor (column 1 above) detects Signal 3 times stronger than Signals 1 and 4.
The data matrix X is formed by multiplying W and H matrices. X defines the actual data observed.
X = W * H
100×10 Matrix{Float64}:
3.14992 3.6785 2.27095 4.38431 … 4.62935 6.28335 6.42979
3.3779 3.59768 1.04542 4.89812 5.21937 5.57645 6.40633
3.96912 4.16342 2.48431 5.85523 5.96662 7.76913 8.01516
3.6271 4.12651 1.62128 5.94856 5.53639 6.56971 7.45212
3.8045 4.90035 4.13112 6.57607 5.40298 9.10991 8.99064
2.49356 4.35984 1.52339 5.63929 … 3.63468 4.92666 6.37032
2.52839 5.17197 4.46409 5.86575 3.17761 7.57725 7.7609
1.44638 4.69815 2.38523 4.69713 1.68958 4.09623 5.42803
0.993292 4.54321 1.68211 3.82179 1.01576 2.68284 4.33605
0.932914 4.40071 1.09691 3.00493 0.97399 1.88121 3.64748
1.28836 4.31318 0.762451 2.3382 … 1.58282 1.85783 3.48375
2.1933 4.52045 1.59914 2.08409 2.91388 3.64914 4.47204
3.27987 4.82368 2.77452 2.04275 4.49499 6.01548 5.86002
⋮ ⋱
2.76766 1.59379 3.03288 6.25565 4.26255 7.53087 6.80121
2.9873 1.23407 1.77703 6.0161 4.8382 6.65673 6.33252
3.86222 1.80431 4.34468 6.42537 … 5.86081 10.1133 8.32529
2.98681 0.958936 1.33609 5.14281 4.8886 6.08283 5.68848
2.75837 1.06879 2.20722 4.58733 4.34256 6.43954 5.57002
2.57751 1.42943 4.10557 4.13956 3.70108 7.78196 5.97316
1.13754 0.57803 1.14068 2.4451 1.77023 2.98233 2.67853
1.04509 0.954249 3.03132 2.05695 … 1.27684 4.43948 3.18543
0.495395 0.618465 1.98183 1.13428 0.514287 2.61444 1.82229
0.653006 0.667452 2.34817 0.841598 0.699253 3.06425 1.95767
0.566658 0.167436 0.373005 0.271777 0.873773 1.08443 0.780351
1.83729 0.818987 2.84724 1.10905 2.57491 5.03879 3.29641
The data matrix X looks like this:
Mads.plotseries(X; name="Sensors")
Now, we can assume that we only know the data matrix X and the W and H matrices are unknown.
We can execute NMFk and analyze the data matrix X.
NMFk will automatically:
X W matrix)H matrix)This can be done based only on the information in X:
nkrange=2:10
We, He, fitquality, robustness, aic, kopt = NMFk.execute(X, nkrange; save=false, method=:simple);
OF: min 563.4561839705091 max 571.0956047569299 mean 566.1096913459744 std 2.5162024019835187 Worst correlation by columns: 0.204840178904257 Worst correlation by rows: 0.6730840865506231 Worst covariance by columns: 0.07619556361296793 Worst covariance by rows: 0.3177632870362544 Worst norm by columns: 0.2062783776100441 Worst norm by rows: 0.5879052371885953 Signals: 2 Fit: 563.4562 Silhouette: 0.9961238 AIC: -133.6657 OF: min 205.10453576810346 max 205.44013709359942 mean 205.2558183299092 std 0.10764194146505947 Worst correlation by columns: 0.765756624884207 Worst correlation by rows: 0.8221459244824038 Worst covariance by columns: 0.09620742799039032 Worst covariance by rows: 0.3340068356918061 Worst norm by columns: 0.20481267664903935 Worst norm by rows: 0.7412498027521249 Signals: 3 Fit: 205.1045 Silhouette: 0.9877389 AIC: -924.2355 OF: min 0.02606110346539826 max 0.3285930894206071 mean 0.08570976712343938 std 0.09054762017310256 Worst correlation by columns: 0.9998437470148135 Worst correlation by rows: 0.9999635107298314 Worst covariance by columns: 0.11034854993765787 Worst covariance by rows: 0.41019527264737216 Worst norm by columns: 0.6303363471078243 Worst norm by rows: 0.5079941550602866 Signals: 4 Fit: 0.0260611 Silhouette: 0.9951292 AIC: -9675.067 OF: min 0.019296683745481904 max 0.1313755817852479 mean 0.05983435689746689 std 0.03358572465839549 Worst correlation by columns: 0.9999051870218388 Worst correlation by rows: 0.9999840409655578 Worst covariance by columns: 0.11049073449732591 Worst covariance by rows: 0.41076142701552415 Worst norm by columns: 0.6614805465202735 Worst norm by rows: 0.5400392690140969 Signals: 5 Fit: 0.01929668 Silhouette: -0.6128532 AIC: -9755.577 OF: min 0.006752373216074524 max 0.20286975450972983 mean 0.04849374109403328 std 0.05784790835383283 Worst correlation by columns: 0.9999482835876661 Worst correlation by rows: 0.9999956205669132 Worst covariance by columns: 0.11043103874232613 Worst covariance by rows: 0.41155827502729214 Worst norm by columns: 0.8228221042701841 Worst norm by rows: 0.5518600983174698 Signals: 6 Fit: 0.006752373 Silhouette: -0.612744 AIC: -10585.62 OF: min 0.0062303074161237084 max 0.04561774488510166 mean 0.021946059024547167 std 0.015762068323271657 Worst correlation by columns: 0.999994988539508 Worst correlation by rows: 0.9999881902662511 Worst covariance by columns: 0.11044002227515037 Worst covariance by rows: 0.4118196452194747 Worst norm by columns: 0.3372663940462135 Worst norm by rows: 0.4914608900380674 Signals: 7 Fit: 0.006230307 Silhouette: -0.7747081 AIC: -10446.08 OF: min 0.0042567264451076345 max 0.04851942247626621 mean 0.02038527154156753 std 0.01431645314970248 Worst correlation by columns: 0.9999933466928541 Worst correlation by rows: 0.9999944996671574 Worst covariance by columns: 0.11043873478859806 Worst covariance by rows: 0.41165807364380347 Worst norm by columns: 0.2545539225774592 Worst norm by rows: 0.6718079734504087 Signals: 8 Fit: 0.004256726 Silhouette: -0.6025868 AIC: -10607.01 OF: min 0.009267875446144248 max 0.046369482106464036 mean 0.022308192866677068 std 0.012614940493108956 Worst correlation by columns: 0.9999818984769164 Worst correlation by rows: 0.9999732994493604 Worst covariance by columns: 0.11042200672761357 Worst covariance by rows: 0.4118426897347687 Worst norm by columns: 0.35746086744005723 Worst norm by rows: 0.5555803324415174 Signals: 9 Fit: 0.009267875 Silhouette: -0.5954714 AIC: -9608.956 OF: min 0.00495255180583985 max 0.018638566799673423 mean 0.012368759426061797 std 0.004788607734740136 Worst correlation by columns: 0.9999939099851183 Worst correlation by rows: 0.9999949650207964 Worst covariance by columns: 0.11040647577170468 Worst covariance by rows: 0.4117908021372535 Worst norm by columns: 0.2926290979379814 Worst norm by rows: 0.6303501719023523 Signals: 10 Fit: 0.004952552 Silhouette: -0.6026156 AIC: -10015.61 Signals: 2 Fit: 563.4562 Silhouette: 0.9961238 AIC: -133.6657 Signals: 3 Fit: 205.1045 Silhouette: 0.9877389 AIC: -924.2355 Signals: 4 Fit: 0.0260611 Silhouette: 0.9951292 AIC: -9675.067 Signals: 5 Fit: 0.01929668 Silhouette: -0.6128532 AIC: -9755.577 Signals: 6 Fit: 0.006752373 Silhouette: -0.612744 AIC: -10585.62 Signals: 7 Fit: 0.006230307 Silhouette: -0.7747081 AIC: -10446.08 Signals: 8 Fit: 0.004256726 Silhouette: -0.6025868 AIC: -10607.01 Signals: 9 Fit: 0.009267875 Silhouette: -0.5954714 AIC: -9608.956 Signals: 10 Fit: 0.004952552 Silhouette: -0.6026156 AIC: -10015.61
┌ Info: Results └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkExecute.jl:15 ┌ Info: Optimal solution: 4 signals └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkExecute.jl:20
NMFk returns the estimated optimal number of signals kopt which in this case, as expected, is equal to 4.
A plot of the fit and the robustness is shown below:
NMFk.plot_feature_selecton(nkrange, fitquality, robustness)
Acceptable (underfitting) solutions:
NMFk.getks(nkrange, robustness[nkrange])
3-element Vector{Int64}:
2
3
4
NMFk also returns estimates of matrices W and H.
Here the estimates of matrices W and H are stored as We and He objects.
We[kopt] and He[kopt] are scaled versions of the original W and H matrices:
We[kopt]
100×4 Matrix{Float64}:
6.98288 9.8499 12.8513 8.35984
7.30996 11.8127 15.9778 0.882809
7.62872 13.5455 17.3301 8.08025
7.97546 14.6198 16.5393 2.98001
8.30801 15.2631 13.8386 16.4947
8.6807 14.9337 9.82891 2.43144
9.00687 14.2667 5.57065 18.8057
9.35944 12.7232 2.03126 8.06176
9.68569 10.7992 0.115306 4.94977
9.97687 8.59067 0.315878 2.53534
10.2451 6.29855 2.56101 1.48994
10.4821 4.18342 6.324 6.74894
10.7024 2.39109 10.6801 13.7316
⋮
0.161788 15.3218 12.8633 10.7629
0.000195678 14.924 15.9681 4.05281
5.17196e-6 14.0734 17.2867 18.289
3.18961e-15 12.3748 16.4477 2.44801
3.61166e-11 10.4437 13.7045 7.95672
2.1087e-7 8.34621 9.71597 19.1721
3.1738e-8 5.93872 5.44963 3.95505
0.00145974 3.94022 1.98462 15.1073
0.0470913 2.11313 0.145647 10.0749
0.0817772 0.858563 0.395428 12.5492
0.165447 0.0975339 2.68436 1.94723
0.19591 0.260951 6.4465 15.3501
He[kopt]
4×10 Matrix{Float64}:
0.0743214 0.373864 0.00599197 … 0.0734625 0.00697259 0.153705
0.00244652 0.0645862 0.066335 0.0039405 0.135276 0.19962
0.173145 0.00390684 0.003374 0.28775 0.233158 0.175627
0.0457141 0.0456666 0.183246 0.0455059 0.227846 0.135591
Note that the order of columns ('signals') in W and We[kopt] are not expected to match.
Also note that the order of rows ('sensors') in H and He[kopt] are also not expected to match.
The estimated order of 'signals' will be different every time the code is executed.
Below are plots providing comparisons between the original and estimated W an H matrices.
A plot of the original signals:
Mads.plotseries(W; title="Original signals")
A plot of the reconstructed signals:
Mads.plotseries(We[kopt] ./ maximum(We[kopt]; dims=1); title="Reconstructed signals")
A plot of the original mixing matrix:
NMFk.plotmatrix(H ./ maximum(H; dims=2); title="Original mixing matrix")
A plot of the reconstructed mixing matrix:
NMFk.plotmatrix(He[kopt] ./ maximum(He[kopt]; dims=2); title="Reconstructed mixing matrix")
Figures above demonstrate the accurate reconstruction of the original W and H matrices.
NMFk results can be further analyzed as demonstrated below:
NMFk.clusterresults(NMFk.getks(nkrange, robustness[nkrange]), We, He, collect(1:100), "s" .* string.(collect(1:10)); Wcasefilename="times", Hcasefilename="sensors", plottimeseries=:W, biplotcolor=:WH, sortmag=false, biplotlabel=:H, point_size_nolabel=2Gadfly.pt, point_size_label=4Gadfly.pt)
Signal importance (high->low): [1, 2]
┌ Info: Number of signals: 2
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:164
┌ Info: Sensors (signals=2)
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:168
┌ Warning: type Clustering.KmeansResult{Core.Array{Core.Float64,2},Core.Float64,Core.Int64} not present in workspace; reconstructing
└ @ JLD /Users/vvv/.julia/packages/JLD/JHrZe/src/jld_types.jl:697
┌ Info: Robust k-means analysis results are loaded from file ./Hmatrix-2-2_10-1000.jld!
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:67
┌ Warning: Procedure to find unique signals could not identify a solution ...
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:158
┌ Warning: Procedure to find unique signals could not identify a solution ...
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:158
┌ Warning: type Clustering.KmeansResult{Core.Array{Core.Float64,2},Core.Float64,Core.Int64} not present in workspace; reconstructing
└ @ JLD /Users/vvv/.julia/packages/JLD/JHrZe/src/jld_types.jl:697
2×2 Matrix{Any}:
"s2" 0.816528
"s7" 0.34415
8×2 Matrix{Any}:
"s8" 1.0
"s9" 0.765652
"s10" 0.664984
"s1" 0.620243
"s6" 0.235877
"s4" 0.183688
"s5" 0.0471445
"s3" 0.00955952
┌ Info: Robust k-means analysis results are loaded from file ./Wmatrix-2-2_100-1000.jld! └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:67 ┌ Warning: Procedure to find unique signals could not identify a solution ... └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:158 ┌ Info: Signal B -> A Count: 2 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:275 ┌ Info: Signal A -> B Count: 8 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:275 ┌ Info: Signal A (S1) (k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:292 ┌ Info: Signal B (S2) (k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:292
36×2 Matrix{Float64}:
48.0 1.0
7.0 0.90646
50.0 0.811182
23.0 0.80957
49.0 0.80668
45.0 0.799657
72.0 0.792193
22.0 0.781473
32.0 0.75111
24.0 0.738339
33.0 0.698459
62.0 0.698044
73.0 0.693751
⋮
60.0 0.54061
21.0 0.450948
10.0 0.449086
34.0 0.437807
12.0 0.422768
57.0 0.397088
61.0 0.3737
11.0 0.36556
59.0 0.298814
19.0 0.298337
36.0 0.257546
58.0 0.214994
64×2 Matrix{Float64}:
29.0 1.0
28.0 0.994387
41.0 0.966333
16.0 0.954458
15.0 0.947798
40.0 0.932646
53.0 0.932094
27.0 0.923499
3.0 0.920881
66.0 0.916317
54.0 0.907322
42.0 0.900872
52.0 0.891139
⋮
95.0 0.269577
87.0 0.234987
74.0 0.231201
83.0 0.19494
96.0 0.168447
99.0 0.124341
71.0 0.109416
84.0 0.0915553
85.0 0.0873578
98.0 0.0869121
86.0 0.0792189
97.0 0.0621416
┌ Info: Times (signals=2) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:350 ┌ Info: Signal A (S2) Count: 64 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:363 ┌ Info: Signal B (S1) Count: 36 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:363 ┌ Info: Signal B -> A Count: 36 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:373 ┌ Info: Signal A -> B Count: 64 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:373 ┌ Info: Signal A (remapped k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:388 ┌ Info: Signal B (remapped k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:388
Signal importance (high->low): [3, 2, 1]
┌ Info: Number of signals: 3
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:164
┌ Info: Sensors (signals=3)
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:168
┌ Warning: type Clustering.KmeansResult{Core.Array{Core.Float64,2},Core.Float64,Core.Int64} not present in workspace; reconstructing
└ @ JLD /Users/vvv/.julia/packages/JLD/JHrZe/src/jld_types.jl:697
┌ Info: Robust k-means analysis results are loaded from file ./Hmatrix-3-3_10-1000.jld!
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:67
┌ Warning: Procedure to find unique signals could not identify a solution ...
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:158
┌ Warning: Procedure to find unique signals could not identify a solution ...
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:158
┌ Warning: Procedure to find unique signals could not identify a solution ...
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:158
┌ Warning: type Clustering.KmeansResult{Core.Array{Core.Float64,2},Core.Float64,Core.Int64} not present in workspace; reconstructing
└ @ JLD /Users/vvv/.julia/packages/JLD/JHrZe/src/jld_types.jl:697
┌ Info: Robust k-means analysis results are loaded from file ./Wmatrix-3-3_100-1000.jld!
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:67
3×2 Matrix{Any}:
"s4" 0.961764
"s5" 0.9239
"s3" 0.675083
5×2 Matrix{Any}:
"s8" 1.0
"s9" 0.804147
"s10" 0.642766
"s1" 0.60861
"s6" 0.221175
2×2 Matrix{Any}:
"s2" 1.0
"s7" 0.419145
┌ Warning: Procedure to find unique signals could not identify a solution ... └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:158 ┌ Info: Signal B -> A Count: 3 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:275 ┌ Info: Signal A -> B Count: 5 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:275 ┌ Info: Signal C -> C Count: 2 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:275 ┌ Info: Signal A (S3) (k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:292 ┌ Info: Signal B (S2) (k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:292 ┌ Info: Signal C (S1) (k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:292
37×2 Matrix{Float64}:
91.0 0.977447
5.0 0.929754
29.0 0.923258
64.0 0.836318
66.0 0.827266
94.0 0.817911
88.0 0.809029
27.0 0.807474
52.0 0.801024
89.0 0.797584
63.0 0.751079
28.0 0.747757
65.0 0.735548
⋮
100.0 0.447315
75.0 0.447037
76.0 0.370654
2.0 0.366794
95.0 0.303642
78.0 0.294368
54.0 0.240205
42.0 0.237622
81.0 0.212876
82.0 0.151576
99.0 0.061458
79.0 0.0
23×2 Matrix{Float64}:
41.0 1.0
16.0 0.98127
53.0 0.966055
15.0 0.963061
40.0 0.961342
17.0 0.919987
55.0 0.865718
30.0 0.844847
14.0 0.829076
39.0 0.823647
80.0 0.785744
77.0 0.779144
18.0 0.757396
51.0 0.689622
56.0 0.642863
31.0 0.642328
13.0 0.638229
38.0 0.638226
19.0 0.516139
57.0 0.42853
37.0 0.424425
58.0 0.221269
34.0 0.127527
40×2 Matrix{Float64}:
35.0 1.0
33.0 0.999454
32.0 0.993931
23.0 0.970644
22.0 0.965673
48.0 0.936445
24.0 0.930074
20.0 0.903684
45.0 0.893843
25.0 0.852064
49.0 0.845472
47.0 0.841897
21.0 0.82803
⋮
72.0 0.44919
73.0 0.404277
70.0 0.353705
71.0 0.312965
74.0 0.303642
85.0 0.214224
84.0 0.168959
83.0 0.126965
86.0 0.120085
87.0 0.10552
98.0 0.100582
97.0 0.0865212
┌ Info: Times (signals=3) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:350 ┌ Info: Signal A (S1) Count: 40 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:363 ┌ Info: Signal B (S3) Count: 37 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:363 ┌ Info: Signal C (S2) Count: 23 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:363 ┌ Info: Signal B -> A Count: 37 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:373 ┌ Info: Signal C -> B Count: 23 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:373 ┌ Info: Signal A -> C Count: 40 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:373 ┌ Info: Signal A (remapped k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:388 ┌ Info: Signal B (remapped k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:388 ┌ Info: Signal C (remapped k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:388
Signal importance (high->low): [4, 3, 2, 1]
┌ Info: Number of signals: 4
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:164
┌ Info: Sensors (signals=4)
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:168
┌ Warning: type Clustering.KmeansResult{Core.Array{Core.Float64,2},Core.Float64,Core.Int64} not present in workspace; reconstructing
└ @ JLD /Users/vvv/.julia/packages/JLD/JHrZe/src/jld_types.jl:697
2×2 Matrix{Any}:
"s5" 1.0
"s3" 0.802067
3×2 Matrix{Any}:
"s8" 1.0
"s9" 0.810281
"s1" 0.60172
┌ Info: Robust k-means analysis results are loaded from file ./Hmatrix-4-4_10-1000.jld!
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:67
┌ Warning: Procedure to find unique signals could not identify a solution ...
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:158
┌ Warning: Procedure to find unique signals could not identify a solution ...
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:158
┌ Warning: Procedure to find unique signals could not identify a solution ...
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:158
┌ Warning: Procedure to find unique signals could not identify a solution ...
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:158
┌ Warning: type Clustering.KmeansResult{Core.Array{Core.Float64,2},Core.Float64,Core.Int64} not present in workspace; reconstructing
└ @ JLD /Users/vvv/.julia/packages/JLD/JHrZe/src/jld_types.jl:697
┌ Info: Robust k-means analysis results are loaded from file ./Wmatrix-4-4_100-1000.jld!
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkCluster.jl:67
┌ Info: Signal D -> A Count: 2
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:275
┌ Info: Signal A -> B Count: 3
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:275
┌ Info: Signal B -> C Count: 3
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:275
┌ Info: Signal C -> D Count: 2
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:275
┌ Info: Signal A (S4) (k-means clustering)
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:292
┌ Info: Signal B (S3) (k-means clustering)
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:292
┌ Info: Signal C (S2) (k-means clustering)
└ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:292
3×2 Matrix{Any}:
"s4" 1.0
"s10" 0.605222
"s6" 0.199823
2×2 Matrix{Any}:
"s2" 1.0
"s7" 0.401176
┌ Info: Signal D (S1) (k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:292
12×2 Matrix{Float64}:
73.0 0.921292
48.0 0.920805
7.0 0.877473
22.0 0.655249
23.0 0.608288
49.0 0.581644
45.0 0.553796
34.0 0.302301
9.0 0.230956
47.0 0.22555
10.0 0.118299
11.0 0.0695206
40×2 Matrix{Float64}:
41.0 1.0
16.0 0.999658
66.0 0.992652
3.0 0.992084
28.0 0.99095
53.0 0.989901
79.0 0.989034
78.0 0.982986
54.0 0.979585
15.0 0.974543
29.0 0.964869
40.0 0.959824
42.0 0.917565
⋮
13.0 0.611396
31.0 0.60331
38.0 0.581349
19.0 0.464894
75.0 0.400071
57.0 0.3924
12.0 0.362026
32.0 0.358336
37.0 0.33514
62.0 0.298027
58.0 0.171558
99.0 0.15367
21×2 Matrix{Float64}:
89.0 1.0
5.0 0.996169
68.0 0.993815
26.0 0.984241
88.0 0.978007
90.0 0.974038
67.0 0.966074
4.0 0.954186
69.0 0.953466
91.0 0.918523
87.0 0.908687
92.0 0.807659
86.0 0.806738
65.0 0.800517
2.0 0.770976
30.0 0.703892
51.0 0.697919
93.0 0.681621
63.0 0.528743
95.0 0.3876
96.0 0.257164
27×2 Matrix{Float64}:
33.0 1.0
35.0 0.993739
36.0 0.99033
25.0 0.977145
24.0 0.970044
21.0 0.938953
20.0 0.921989
44.0 0.905292
46.0 0.879213
50.0 0.799437
8.0 0.69847
6.0 0.647817
59.0 0.596794
⋮
70.0 0.325694
71.0 0.304884
72.0 0.279462
74.0 0.234344
82.0 0.0877105
83.0 0.0760717
84.0 0.0648747
85.0 0.0522789
100.0 0.0146202
98.0 0.00610281
97.0 0.0035143
94.0 1.57367e-8
┌ Info: Times (signals=4) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:350 ┌ Info: Signal A (S3) Count: 40 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:363 ┌ Info: Signal B (S1) Count: 27 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:363 ┌ Info: Signal C (S2) Count: 21 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:363 ┌ Info: Signal D (S4) Count: 12 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:363 ┌ Info: Signal D -> A Count: 12 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:373 ┌ Info: Signal A -> B Count: 40 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:373 ┌ Info: Signal C -> C Count: 21 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:373 ┌ Info: Signal B -> D Count: 27 └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:373 ┌ Info: Signal A (remapped k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:388 ┌ Info: Signal B (remapped k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:388 ┌ Info: Signal C (remapped k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:388 ┌ Info: Signal D (remapped k-means clustering) └ @ NMFk /Users/vvv/.julia/dev/NMFk/src/NMFkPostprocess.jl:388
([[1, 2], [3, 2, 1], [4, 3, 2, 1]], [['B', 'B', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A' … 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'], ['A', 'A', 'A', 'A', 'A', 'C', 'C', 'C', 'C', 'C' … 'A', 'A', 'A', 'A', 'A', 'A', 'C', 'C', 'A', 'A'], ['B', 'C', 'B', 'C', 'C', 'D', 'A', 'D', 'A', 'A' … 'C', 'C', 'C', 'D', 'C', 'C', 'D', 'D', 'B', 'D']], [['B', 'A', 'B', 'B', 'B', 'B', 'A', 'B', 'B', 'B'], ['B', 'C', 'A', 'A', 'A', 'B', 'C', 'B', 'B', 'B'], ['B', 'D', 'A', 'C', 'A', 'C', 'D', 'B', 'B', 'C']])
The code above perform analyses of all the acceptable solutions. These are solutions with number of extracted features equal to 2, 3, and 4. The solution with 4 features is the optimal one. The solutions for 2 and 3 features are underfitting but informative as well.
Extracted features beased on the solutions for 2, 3, and 4 signals look like this:
for i = 2:4
Mads.display("times-$i-timeseries.png")
end
The 10 sensors are grouped intp 4 groups. The sensor grouping is based on which of the 4 signals are mostly detected by the 4 sensors. The sensor grouping is listed below:
Mads.display("sensors-4-groups.txt")
Signal A (S4) s5 1.0 s3 0.802 Signal B (S3) s8 1.0 s9 0.81 s1 0.602 Signal C (S2) s4 1.0 s10 0.605 s6 0.2 Signal D (S1) s2 1.0 s7 0.401
This grouping is based on analyses of the attribute matrix H presented below.
The grouping process tries to pick up the most important signal observed by each sensor.
However, there are challanges when more than one signal is present.
Mads.display("sensors-4-labeled-sorted.png")
The clustering of the sensors into groups at the different levels of clsutering is visualized below:
Mads.display("sensors-4-labeled-sorted-dendogram.png")
The biplots below show how the 4 extracted features are projecting the sensors and the timeseries data. Here, the features are viewed as basis vectors spanning the sensor/time space. Sensors located along the basis vectors (i.e., plot axes) are the most informative to charecterize the data. Temporal messurements along the plot axes are also the most important to represent the observed processes.
Mads.display("all-4-biplots-original.pdf")
┌ Info: Precompiling ImageMagick [6218d12a-5da1-5696-b52f-db25d2ecc6d1] └ @ Base loading.jl:1317